My final project is an expansion of my midterm submission, and in this final submission I went back and redid the graphs from my midterm, now using plotly. I like the very clean and interactive features of plotly graphs and will be using them throughout my final project.
While I am a Math major, I have a pretty keen interest in Computer Science (which I am minoring in), and so I am used to reading documentation (and have read the pandas documentation thoroughly), as well as using intuitive Python code which helps a lot when sorting through data.
My interest in labor studies are from a more personal perspective, as an enthusiast of political economy and labor organizing.
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import plotly.offline as pyo
import plotly.graph_objs as go
import plotly.io as pio
import warnings
warnings.filterwarnings("ignore")
mpl.style.use('dark_background')
pio.templates.default = 'plotly_dark'
empl_sector_pop_file = 'datasets/employment-by-economic-sector.csv'
sector_gdp_file = 'datasets/composition-of-national-gross-domestic-product-by-sector.csv'
nat_occ_20_file = 'datasets/oesm20nat/national_M2020_dl.xlsx'
empl_pop = pd.read_csv(empl_sector_pop_file)
sect_gdp = pd.read_csv(sector_gdp_file)
nat_occ = pd.read_excel(nat_occ_20_file)
empl_pop.head()
| Entity | Code | Year | Number of people employed in agriculture (Herrendorf et al. data) | Number of people employed in industry (Herrendorf et al. data) | Number of people employed in services (Herrendorf et al. data) | |
|---|---|---|---|---|---|---|
| 0 | Belgium | BEL | 1846 | 681000.0 | 414000.0 | 178000.0 |
| 1 | Belgium | BEL | 1856 | 712000.0 | 525000.0 | 211000.0 |
| 2 | Belgium | BEL | 1866 | 705000.0 | 627000.0 | 242000.0 |
| 3 | Belgium | BEL | 1880 | 674000.0 | 701000.0 | 384000.0 |
| 4 | Belgium | BEL | 1890 | 640000.0 | 827000.0 | 526000.0 |
empl_pop.size
4350
For this analysis, I will only focus on the data for the United States. But in a future analysis, observing global economic and industrial trends should provide interesting insights.
This first file shows the number of workers employed across agriculture, industry, and the service sectors over the years since 1840.
empl_pop_usa = empl_pop.loc[empl_pop['Entity'] == 'United States'].copy()
empl_pop_usa
| Entity | Code | Year | Number of people employed in agriculture (Herrendorf et al. data) | Number of people employed in industry (Herrendorf et al. data) | Number of people employed in services (Herrendorf et al. data) | |
|---|---|---|---|---|---|---|
| 629 | United States | USA | 1840 | 3570000.0 | 822000.0 | 1268000.0 |
| 630 | United States | USA | 1850 | 4520000.0 | 1712000.0 | 2018000.0 |
| 631 | United States | USA | 1860 | 5880000.0 | 2226000.0 | 3004000.0 |
| 632 | United States | USA | 1870 | 6790000.0 | 3430000.0 | 2710000.0 |
| 633 | United States | USA | 1880 | 8920000.0 | 4470000.0 | 4000000.0 |
| ... | ... | ... | ... | ... | ... | ... |
| 720 | United States | USA | 2011 | 1962000.0 | 19453000.0 | 110120000.0 |
| 721 | United States | USA | 2012 | 1919000.0 | 19798000.0 | 112340000.0 |
| 722 | United States | USA | 2013 | 1957000.0 | 20113000.0 | 114099000.0 |
| 723 | United States | USA | 2014 | 1947000.0 | 20779000.0 | 117508000.0 |
| 724 | United States | USA | 2015 | 2050000.0 | 21105000.0 | 118704000.0 |
96 rows × 6 columns
empl_pop.head()
| Entity | Code | Year | Number of people employed in agriculture (Herrendorf et al. data) | Number of people employed in industry (Herrendorf et al. data) | Number of people employed in services (Herrendorf et al. data) | |
|---|---|---|---|---|---|---|
| 0 | Belgium | BEL | 1846 | 681000.0 | 414000.0 | 178000.0 |
| 1 | Belgium | BEL | 1856 | 712000.0 | 525000.0 | 211000.0 |
| 2 | Belgium | BEL | 1866 | 705000.0 | 627000.0 | 242000.0 |
| 3 | Belgium | BEL | 1880 | 674000.0 | 701000.0 | 384000.0 |
| 4 | Belgium | BEL | 1890 | 640000.0 | 827000.0 | 526000.0 |
empl_pop_usa = empl_pop.loc[empl_pop['Entity'] == 'United States'].copy()
empl_pop_usa
| Entity | Code | Year | Number of people employed in agriculture (Herrendorf et al. data) | Number of people employed in industry (Herrendorf et al. data) | Number of people employed in services (Herrendorf et al. data) | |
|---|---|---|---|---|---|---|
| 629 | United States | USA | 1840 | 3570000.0 | 822000.0 | 1268000.0 |
| 630 | United States | USA | 1850 | 4520000.0 | 1712000.0 | 2018000.0 |
| 631 | United States | USA | 1860 | 5880000.0 | 2226000.0 | 3004000.0 |
| 632 | United States | USA | 1870 | 6790000.0 | 3430000.0 | 2710000.0 |
| 633 | United States | USA | 1880 | 8920000.0 | 4470000.0 | 4000000.0 |
| ... | ... | ... | ... | ... | ... | ... |
| 720 | United States | USA | 2011 | 1962000.0 | 19453000.0 | 110120000.0 |
| 721 | United States | USA | 2012 | 1919000.0 | 19798000.0 | 112340000.0 |
| 722 | United States | USA | 2013 | 1957000.0 | 20113000.0 | 114099000.0 |
| 723 | United States | USA | 2014 | 1947000.0 | 20779000.0 | 117508000.0 |
| 724 | United States | USA | 2015 | 2050000.0 | 21105000.0 | 118704000.0 |
96 rows × 6 columns
sectors = empl_pop_usa.columns.values.tolist()[3:]
for sector in sectors:
print("Highest number of people in " + sector.split()[5] + ": " + str(max(empl_pop_usa[sector])))
print("Lowest number of people in " + sector.split()[5] + ": " + str(min(empl_pop_usa[sector])))
print("\n")
Highest number of people in agriculture: 11770000.0 Lowest number of people in agriculture: 1907000.0 Highest number of people in industry: 27534000.0 Lowest number of people in industry: 822000.0 Highest number of people in services: 118704000.0 Lowest number of people in services: 1268000.0
sectors = empl_pop_usa.columns.values.tolist()[3:]
data = [go.Scatter(x=empl_pop_usa.Year, y=empl_pop_usa[sector], mode='lines', name=sector.split()[5]) for sector in sectors]
layout = go.Layout(title='Number of people employed in agriculture, industry, and services')
fig = go.Figure(data=data,layout=layout)
pyo.iplot(fig)
Visualizing this data shows us quite a lot. There is a clear explosion of the service industry since the 1950s and since the 1950s the service industry has been a clear dominant force in the US economy.
From my knowledge of history, I know that in the post-WWII era the United States began outsourcing manufacturing industry and exploiting the cheaper labor and resources found in the Global South. These conditions led to the workforce in the US largely taking up service work in lieu of industrial and manufacturing jobs.
This article briefly mentions that outsourcing and also discusses a second-wave of outsourcing happening in the US.
Now I will take a look at the trends of GDP across the agriculture, industry, and service sectors since 1840.
sect_gdp_usa = sect_gdp[sect_gdp.Entity == 'United States']
sect_gdp_usa
| Entity | Code | Year | Share of agriculture in GDP at current prices (Herrendorf et al. data) | Share of industry in GDP at current prices (Herrendorf et al. data) | Share of services in GDP at current prices (Herrendorf et al. data) | |
|---|---|---|---|---|---|---|
| 1240 | United States | USA | 1839 | 42.63 | 19.37 | 38.0 |
| 1241 | United States | USA | 1849 | 36.03 | 24.97 | 39.0 |
| 1242 | United States | USA | 1859 | 34.29 | 24.71 | 41.0 |
| 1243 | United States | USA | 1869 | 33.09 | 29.91 | 37.0 |
| 1244 | United States | USA | 1879 | 28.42 | 29.58 | 42.0 |
| ... | ... | ... | ... | ... | ... | ... |
| 1350 | United States | USA | 2012 | 1.20 | 18.40 | 80.5 |
| 1351 | United States | USA | 2013 | 1.30 | 18.60 | 80.0 |
| 1352 | United States | USA | 2014 | 1.20 | 18.80 | 80.2 |
| 1353 | United States | USA | 2015 | 1.00 | 17.90 | 81.2 |
| 1354 | United States | USA | 2016 | 0.90 | 17.30 | 81.9 |
115 rows × 6 columns
sectors = sect_gdp_usa.columns.values.tolist()[3:]
for sector in sectors:
print("Highest GDP of " + sector.split()[2] + ": " + str(max(sect_gdp_usa[sector])))
print("Lowest GDP of " + sector.split()[2] + ": " + str(min(sect_gdp_usa[sector])))
print("\n")
Highest GDP of agriculture: 42.63 Lowest GDP of agriculture: 0.9 Highest GDP of industry: 40.92 Lowest GDP of industry: 17.3 Highest GDP of services: 81.9 Lowest GDP of services: 37.0
The service sector being over 80% of the US economy was a surprising figure. Also very surprising is how low agriculture took up in the US GDP, as I thought it would be higher with our corn and grain production.
sectors = sect_gdp_usa.columns.values.tolist()[3:]
data = [go.Scatter(x=sect_gdp_usa.Year, y=sect_gdp_usa[sector], mode='lines', name=sector.split()[2]) for sector in sectors]
layout = go.Layout(title='Share of US National GDP by the agriculture, industry, and services sectors')
fig = go.Figure(data=data,layout=layout)
pyo.iplot(fig)
This too shows the clear boom of the US service sector in the post-WWII world. It also shows the rapid fall of the agriculture sector in the US since the late 1800s. The United States Industrial Revolution occurred around then, which led to a boom in manufacturing, industry and services as agriculture became less dominant in the national economy.
On a note about the accuracy of the dataset I am using, it appears the data becomes more "wild", as in more frequent data points around 1910. The smoother lines before 1910 are likely due to the lack of data then, and so the data points from 1910 forwards represent a more accurate visual on the US economy.
Reflecting on this observation of the US economic trend over the past few decades makes me curious about specific occupations and industry sectors in the US too. I have acquired a database which provides this information and will expand upon my mid-semester project. The next several cells will analyze 2021 data on jobs and occupations in the United States, looking into their total employment per employed population and comparing wages.
nat_occ
| AREA | AREA_TITLE | AREA_TYPE | PRIM_STATE | NAICS | NAICS_TITLE | I_GROUP | OWN_CODE | OCC_CODE | OCC_TITLE | ... | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | ANNUAL | HOURLY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 00-0000 | All Occupations | ... | 20.17 | 32.41 | 50.99 | 22810 | 29020 | 41950 | 67410 | 106050 | NaN | NaN |
| 1 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 11-0000 | Management Occupations | ... | 52.77 | 76.71 | # | 51670 | 74250 | 109760 | 159550 | # | NaN | NaN |
| 2 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 11-1000 | Top Executives | ... | 51.05 | 80.73 | # | 44530 | 67740 | 106180 | 167930 | # | NaN | NaN |
| 3 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 11-1010 | Chief Executives | ... | 89.4 | # | # | 62780 | 114530 | 185950 | # | # | NaN | NaN |
| 4 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 11-1011 | Chief Executives | ... | 89.4 | # | # | 62780 | 114530 | 185950 | # | # | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1324 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 53-7081 | Refuse and Recyclable Material Collectors | ... | 18.8 | 24.77 | 32.47 | 23880 | 30180 | 39100 | 51530 | 67530 | NaN | NaN |
| 1325 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 53-7120 | Tank Car, Truck, and Ship Loaders | ... | 21.93 | 31.22 | 38.08 | 31110 | 36280 | 45610 | 64940 | 79220 | NaN | NaN |
| 1326 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 53-7121 | Tank Car, Truck, and Ship Loaders | ... | 21.93 | 31.22 | 38.08 | 31110 | 36280 | 45610 | 64940 | 79220 | NaN | NaN |
| 1327 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 53-7190 | Miscellaneous Material Moving Workers | ... | 15.8 | 21.38 | 28.63 | 25050 | 27760 | 32850 | 44480 | 59550 | NaN | NaN |
| 1328 | 99 | U.S. | 1 | US | 0 | Cross-industry | cross-industry | 1235 | 53-7199 | Material Moving Workers, All Other | ... | 15.8 | 21.38 | 28.63 | 25050 | 27760 | 32850 | 44480 | 59550 | NaN | NaN |
1329 rows × 31 columns
nat_occ.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1329 entries, 0 to 1328 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AREA 1329 non-null int64 1 AREA_TITLE 1329 non-null object 2 AREA_TYPE 1329 non-null int64 3 PRIM_STATE 1329 non-null object 4 NAICS 1329 non-null int64 5 NAICS_TITLE 1329 non-null object 6 I_GROUP 1329 non-null object 7 OWN_CODE 1329 non-null int64 8 OCC_CODE 1329 non-null object 9 OCC_TITLE 1329 non-null object 10 O_GROUP 1329 non-null object 11 TOT_EMP 1329 non-null object 12 EMP_PRSE 1329 non-null object 13 JOBS_1000 0 non-null float64 14 LOC_QUOTIENT 0 non-null float64 15 PCT_TOTAL 0 non-null float64 16 H_MEAN 1329 non-null object 17 A_MEAN 1329 non-null object 18 MEAN_PRSE 1329 non-null float64 19 H_PCT10 1329 non-null object 20 H_PCT25 1329 non-null object 21 H_MEDIAN 1329 non-null object 22 H_PCT75 1329 non-null object 23 H_PCT90 1329 non-null object 24 A_PCT10 1329 non-null object 25 A_PCT25 1329 non-null object 26 A_MEDIAN 1329 non-null object 27 A_PCT75 1329 non-null object 28 A_PCT90 1329 non-null object 29 ANNUAL 82 non-null object 30 HOURLY 6 non-null object dtypes: float64(4), int64(4), object(23) memory usage: 322.0+ KB
I will be dropping irrelevant columns to make working with the dataframe easier.
nat_occ = nat_occ.drop(['AREA', 'AREA_TITLE', 'AREA_TYPE', 'PRIM_STATE', 'NAICS', 'NAICS_TITLE', 'I_GROUP', 'OWN_CODE', 'OCC_CODE', 'ANNUAL', 'HOURLY'], axis=1)
nat_occ
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | All Occupations | total | 139099570 | 0.1 | NaN | NaN | NaN | 27.07 | 56310 | 0.1 | 10.97 | 13.95 | 20.17 | 32.41 | 50.99 | 22810 | 29020 | 41950 | 67410 | 106050 |
| 1 | Management Occupations | major | 7947300 | 0.2 | NaN | NaN | NaN | 60.81 | 126480 | 0.2 | 24.84 | 35.7 | 52.77 | 76.71 | # | 51670 | 74250 | 109760 | 159550 | # |
| 2 | Top Executives | minor | 2601070 | 0.4 | NaN | NaN | NaN | 62.46 | 129920 | 0.2 | 21.41 | 32.57 | 51.05 | 80.73 | # | 44530 | 67740 | 106180 | 167930 | # |
| 3 | Chief Executives | broad | 202360 | 1 | NaN | NaN | NaN | 95.12 | 197840 | 0.5 | 30.18 | 55.06 | 89.4 | # | # | 62780 | 114530 | 185950 | # | # |
| 4 | Chief Executives | detailed | 202360 | 1 | NaN | NaN | NaN | 95.12 | 197840 | 0.5 | 30.18 | 55.06 | 89.4 | # | # | 62780 | 114530 | 185950 | # | # |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1324 | Refuse and Recyclable Material Collectors | detailed | 120850 | 1.8 | NaN | NaN | NaN | 20.49 | 42620 | 1.2 | 11.48 | 14.51 | 18.8 | 24.77 | 32.47 | 23880 | 30180 | 39100 | 51530 | 67530 |
| 1325 | Tank Car, Truck, and Ship Loaders | broad | 12610 | 9 | NaN | NaN | NaN | 24.36 | 50670 | 2.2 | 14.96 | 17.44 | 21.93 | 31.22 | 38.08 | 31110 | 36280 | 45610 | 64940 | 79220 |
| 1326 | Tank Car, Truck, and Ship Loaders | detailed | 12610 | 9 | NaN | NaN | NaN | 24.36 | 50670 | 2.2 | 14.96 | 17.44 | 21.93 | 31.22 | 38.08 | 31110 | 36280 | 45610 | 64940 | 79220 |
| 1327 | Miscellaneous Material Moving Workers | broad | 26300 | 4.6 | NaN | NaN | NaN | 18.16 | 37770 | 1.5 | 12.05 | 13.35 | 15.8 | 21.38 | 28.63 | 25050 | 27760 | 32850 | 44480 | 59550 |
| 1328 | Material Moving Workers, All Other | detailed | 26300 | 4.6 | NaN | NaN | NaN | 18.16 | 37770 | 1.5 | 12.05 | 13.35 | 15.8 | 21.38 | 28.63 | 25050 | 27760 | 32850 | 44480 | 59550 |
1329 rows × 20 columns
One quick note about this database is that it conveniantly has several levels of groupings for jobs and occupations: 1) Major groups (major category of jobs) 2) Minor groups (a more specific category of jobs) 3) Broad groups 4) Detailed (the specific jobs themselves)
nat_occ['TOT_EMP'] = pd.to_numeric(nat_occ['TOT_EMP'], errors='coerce')
nat_sort = nat_occ.sort_values(by=['TOT_EMP'], ascending=False)
nat_df = nat_sort[nat_sort.O_GROUP == 'detailed']
nat_df
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 741 | Retail Salespersons | detailed | 3659670.0 | 0.6 | NaN | NaN | NaN | 14.87 | 30940 | 0.2 | 9.49 | 11.01 | 13.02 | 15.86 | 21.43 | 19740 | 22900 | 27080 | 32990 | 44570 |
| 644 | Fast Food and Counter Workers | detailed | 3450120.0 | 0.7 | NaN | NaN | NaN | 11.8 | 24540 | 0.3 | 8.63 | 9.52 | 11.47 | 13.42 | 15.37 | 17940 | 19800 | 23860 | 27920 | 31960 |
| 735 | Cashiers | detailed | 3333100.0 | 0.7 | NaN | NaN | NaN | 12.36 | 25710 | 0.2 | 9.06 | 10.25 | 12.03 | 13.86 | 15.69 | 18850 | 21310 | 25020 | 28840 | 32630 |
| 560 | Home Health and Personal Care Aides | detailed | 3211590.0 | 0.7 | NaN | NaN | NaN | 13.49 | 28060 | 0.3 | 9.68 | 11.33 | 13.02 | 15.04 | 17.79 | 20130 | 23560 | 27080 | 31280 | 36990 |
| 509 | Registered Nurses | detailed | 2986500.0 | 0.5 | NaN | NaN | NaN | 38.47 | 80010 | 0.4 | 25.68 | 29.63 | 36.22 | 44.99 | 55.88 | 53410 | 61630 | 75330 | 93590 | 116230 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1053 | Refractory Materials Repairers, Except Brickma... | detailed | 760.0 | 16.6 | NaN | NaN | NaN | 26.48 | 55080 | 2.6 | 17.4 | 21.42 | 26.26 | 30.79 | 36.8 | 36190 | 44560 | 54610 | 64040 | 76550 |
| 485 | Prosthodontists | detailed | 530.0 | 23.5 | NaN | NaN | NaN | 103.3 | 214870 | 11.4 | 38.59 | 57.55 | # | # | # | 80270 | 119710 | # | # | # |
| 634 | Cooks, Private Household | detailed | 320.0 | 30.7 | NaN | NaN | NaN | 22.51 | 46810 | 17.0 | 13.4 | 14.26 | 15.69 | 23.04 | 47.59 | 27860 | 29650 | 32630 | 47920 | 98980 |
| 1178 | Patternmakers, Wood | detailed | 190.0 | 17.9 | NaN | NaN | NaN | 29.21 | 60750 | 4.7 | 15.57 | 21.73 | 31.31 | 36.39 | 39.93 | 32380 | 45210 | 65120 | 75690 | 83050 |
| 82 | Farm Labor Contractors | detailed | NaN | ** | NaN | NaN | NaN | 24.18 | 50300 | 4.7 | 17.52 | 20.8 | 22.97 | 25.14 | 34.43 | 36440 | 43260 | 47780 | 52280 | 71620 |
789 rows × 20 columns
nat_df.head(20)
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 741 | Retail Salespersons | detailed | 3659670.0 | 0.6 | NaN | NaN | NaN | 14.87 | 30940 | 0.2 | 9.49 | 11.01 | 13.02 | 15.86 | 21.43 | 19740 | 22900 | 27080 | 32990 | 44570 |
| 644 | Fast Food and Counter Workers | detailed | 3450120.0 | 0.7 | NaN | NaN | NaN | 11.8 | 24540 | 0.3 | 8.63 | 9.52 | 11.47 | 13.42 | 15.37 | 17940 | 19800 | 23860 | 27920 | 31960 |
| 735 | Cashiers | detailed | 3333100.0 | 0.7 | NaN | NaN | NaN | 12.36 | 25710 | 0.2 | 9.06 | 10.25 | 12.03 | 13.86 | 15.69 | 18850 | 21310 | 25020 | 28840 | 32630 |
| 560 | Home Health and Personal Care Aides | detailed | 3211590.0 | 0.7 | NaN | NaN | NaN | 13.49 | 28060 | 0.3 | 9.68 | 11.33 | 13.02 | 15.04 | 17.79 | 20130 | 23560 | 27080 | 31280 | 36990 |
| 509 | Registered Nurses | detailed | 2986500.0 | 0.5 | NaN | NaN | NaN | 38.47 | 80010 | 0.4 | 25.68 | 29.63 | 36.22 | 44.99 | 55.88 | 53410 | 61630 | 75330 | 93590 | 116230 |
| 809 | Customer Service Representatives | detailed | 2833250.0 | 0.5 | NaN | NaN | NaN | 18.51 | 38510 | 0.2 | 11.59 | 13.83 | 17.23 | 21.83 | 27.8 | 24120 | 28760 | 35830 | 45400 | 57830 |
| 1315 | Laborers and Freight, Stock, and Material Move... | detailed | 2805200.0 | 0.7 | NaN | NaN | NaN | 16.21 | 33710 | 0.2 | 10.96 | 12.77 | 14.96 | 18.57 | 23.39 | 22790 | 26550 | 31120 | 38620 | 48650 |
| 871 | Office Clerks, General | detailed | 2788090.0 | 0.4 | NaN | NaN | NaN | 18.16 | 37770 | 0.2 | 10.59 | 13.26 | 16.98 | 21.87 | 27.47 | 22030 | 27570 | 35330 | 45480 | 57140 |
| 6 | General and Operations Managers | detailed | 2347420.0 | 0.4 | NaN | NaN | NaN | 60.45 | 125740 | 0.2 | 22.04 | 32.43 | 49.83 | 77.49 | # | 45850 | 67450 | 103650 | 161190 | # |
| 1318 | Stockers and Order Fillers | detailed | 2210960.0 | 0.8 | NaN | NaN | NaN | 14.91 | 31010 | 0.2 | 10.29 | 11.96 | 14.03 | 17.06 | 20.64 | 21410 | 24870 | 29190 | 35490 | 42930 |
| 665 | Janitors and Cleaners, Except Maids and Housek... | detailed | 1990510.0 | 0.6 | NaN | NaN | NaN | 15.1 | 31410 | 0.3 | 9.8 | 11.61 | 13.98 | 17.6 | 22.54 | 20380 | 24140 | 29080 | 36600 | 46870 |
| 646 | Waiters and Waitresses | detailed | 1944240.0 | 0.6 | NaN | NaN | NaN | 13.2 | 27470 | 0.5 | 8.42 | 9.27 | 11.42 | 14.73 | 20.46 | 17520 | 19290 | 23740 | 30650 | 42550 |
| 859 | Secretaries and Administrative Assistants, Exc... | detailed | 1850360.0 | 0.5 | NaN | NaN | NaN | 19.43 | 40420 | 0.2 | 12.32 | 14.99 | 18.68 | 23.49 | 28.41 | 25630 | 31180 | 38850 | 48860 | 59090 |
| 1262 | Heavy and Tractor-Trailer Truck Drivers | detailed | 1797710.0 | 0.7 | NaN | NaN | NaN | 23.42 | 48710 | 0.2 | 14.74 | 18.2 | 22.66 | 27.89 | 33.41 | 30660 | 37850 | 47130 | 58010 | 69480 |
| 134 | Software Developers and Software Quality Assur... | detailed | 1476800.0 | 0.9 | NaN | NaN | NaN | 54.94 | 114270 | 0.7 | 31.35 | 40.39 | 52.95 | 67.53 | 81.78 | 65210 | 84020 | 110140 | 140470 | 170100 |
| 97 | Project Management Specialists and Business Op... | detailed | 1444420.0 | 0.5 | NaN | NaN | NaN | 40.53 | 84290 | 0.3 | 20.28 | 27.3 | 37.22 | 50.2 | 65.01 | 42180 | 56790 | 77420 | 104410 | 135220 |
| 788 | Bookkeeping, Accounting, and Auditing Clerks | detailed | 1443940.0 | 0.5 | NaN | NaN | NaN | 21.2 | 44100 | 0.2 | 13.01 | 16.31 | 20.39 | 25.21 | 30.72 | 27050 | 33920 | 42410 | 52430 | 63900 |
| 774 | First-Line Supervisors of Office and Administr... | detailed | 1427260.0 | 0.4 | NaN | NaN | NaN | 29.81 | 62010 | 0.2 | 17 | 21.59 | 28.1 | 35.9 | 45.27 | 35360 | 44900 | 58450 | 74660 | 94170 |
| 562 | Nursing Assistants | detailed | 1371050.0 | 0.6 | NaN | NaN | NaN | 15.41 | 32050 | 0.3 | 10.94 | 12.81 | 14.83 | 17.78 | 20.25 | 22750 | 26650 | 30850 | 36990 | 42110 |
| 380 | Elementary School Teachers, Except Special Edu... | detailed | 1364870.0 | 0.9 | NaN | NaN | NaN | * | 65420 | 0.6 | * | * | * | * | * | 40030 | 48350 | 60940 | 79120 | 100480 |
Retail salespeople, cashiers, and fast food workers are the top 3 most populated jobs in the US!
Taking a cursory glance at their wages, it seems the most common and populated jobs are also the ones that pay very poorly. I will look into the wage makeup of the top several detailed job categories to compare.
For now, let's look at the broad category of jobs and see which ones are the most populous.
nat_df_broad = nat_sort[nat_sort.O_GROUP == 'broad']
nat_df_broad
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1313 | Laborers and Material Movers | broad | 6021330.0 | 0.4 | NaN | NaN | NaN | 15.4 | 32040 | 0.2 | 10.47 | 12.23 | 14.38 | 17.69 | 21.85 | 21780 | 25430 | 29910 | 36800 | 45440 |
| 740 | Retail Salespersons | broad | 3659670.0 | 0.6 | NaN | NaN | NaN | 14.87 | 30940 | 0.2 | 9.49 | 11.01 | 13.02 | 15.86 | 21.43 | 19740 | 22900 | 27080 | 32990 | 44570 |
| 643 | Fast Food and Counter Workers | broad | 3450120.0 | 0.7 | NaN | NaN | NaN | 11.8 | 24540 | 0.3 | 8.63 | 9.52 | 11.47 | 13.42 | 15.37 | 17940 | 19800 | 23860 | 27920 | 31960 |
| 734 | Cashiers | broad | 3347090.0 | 0.7 | NaN | NaN | NaN | 12.37 | 25730 | 0.2 | 9.07 | 10.25 | 12.03 | 13.87 | 15.7 | 18860 | 21320 | 25020 | 28850 | 32650 |
| 559 | Home Health and Personal Care Aides | broad | 3211590.0 | 0.7 | NaN | NaN | NaN | 13.49 | 28060 | 0.3 | 9.68 | 11.33 | 13.02 | 15.04 | 17.79 | 20130 | 23560 | 27080 | 31280 | 36990 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1279 | Miscellaneous Rail Transportation Workers | broad | 1540.0 | 14.5 | NaN | NaN | NaN | 26.28 | 54670 | 3.2 | 15.71 | 17.99 | 23.06 | 32.85 | 41.08 | 32670 | 37420 | 47970 | 68340 | 85450 |
| 159 | Agricultural Engineers | broad | 1440.0 | 10.3 | NaN | NaN | NaN | 48.86 | 101620 | 11.5 | 24.6 | 30.14 | 40.58 | 50.96 | 80.11 | 51160 | 62700 | 84410 | 106000 | 166620 |
| 1092 | Timing Device Assemblers and Adjusters | broad | 1000.0 | 14.5 | NaN | NaN | NaN | 18.96 | 39430 | 4.9 | 13.05 | 14.33 | 17.39 | 22.74 | 28.33 | 27140 | 29800 | 36170 | 47300 | 58930 |
| 1176 | Model Makers and Patternmakers, Wood | broad | 990.0 | 25.4 | NaN | NaN | NaN | 29.93 | 62250 | 2.4 | 17.98 | 24.47 | 30.88 | 36.37 | 39.77 | 37400 | 50890 | 64240 | 75660 | 82720 |
| 887 | Animal Breeders | broad | 920.0 | 12.7 | NaN | NaN | NaN | 21.12 | 43930 | 2.9 | 11.57 | 14.66 | 19.6 | 26.22 | 31.1 | 24060 | 30500 | 40770 | 54530 | 64680 |
425 rows × 20 columns
nat_df_broad.head(10)
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1313 | Laborers and Material Movers | broad | 6021330.0 | 0.4 | NaN | NaN | NaN | 15.4 | 32040 | 0.2 | 10.47 | 12.23 | 14.38 | 17.69 | 21.85 | 21780 | 25430 | 29910 | 36800 | 45440 |
| 740 | Retail Salespersons | broad | 3659670.0 | 0.6 | NaN | NaN | NaN | 14.87 | 30940 | 0.2 | 9.49 | 11.01 | 13.02 | 15.86 | 21.43 | 19740 | 22900 | 27080 | 32990 | 44570 |
| 643 | Fast Food and Counter Workers | broad | 3450120.0 | 0.7 | NaN | NaN | NaN | 11.8 | 24540 | 0.3 | 8.63 | 9.52 | 11.47 | 13.42 | 15.37 | 17940 | 19800 | 23860 | 27920 | 31960 |
| 734 | Cashiers | broad | 3347090.0 | 0.7 | NaN | NaN | NaN | 12.37 | 25730 | 0.2 | 9.07 | 10.25 | 12.03 | 13.87 | 15.7 | 18860 | 21320 | 25020 | 28850 | 32650 |
| 559 | Home Health and Personal Care Aides | broad | 3211590.0 | 0.7 | NaN | NaN | NaN | 13.49 | 28060 | 0.3 | 9.68 | 11.33 | 13.02 | 15.04 | 17.79 | 20130 | 23560 | 27080 | 31280 | 36990 |
| 1260 | Driver/Sales Workers and Truck Drivers | broad | 3148070.0 | 0.5 | NaN | NaN | NaN | 21.25 | 44200 | 0.2 | 11.35 | 15.23 | 20.4 | 25.92 | 32.65 | 23600 | 31680 | 42430 | 53920 | 67900 |
| 855 | Secretaries and Administrative Assistants | broad | 3111790.0 | 0.4 | NaN | NaN | NaN | 21.54 | 44800 | 0.2 | 12.92 | 15.77 | 19.71 | 25.62 | 32.46 | 26880 | 32810 | 40990 | 53280 | 67510 |
| 508 | Registered Nurses | broad | 2986500.0 | 0.5 | NaN | NaN | NaN | 38.47 | 80010 | 0.4 | 25.68 | 29.63 | 36.22 | 44.99 | 55.88 | 53410 | 61630 | 75330 | 93590 | 116230 |
| 808 | Customer Service Representatives | broad | 2833250.0 | 0.5 | NaN | NaN | NaN | 18.51 | 38510 | 0.2 | 11.59 | 13.83 | 17.23 | 21.83 | 27.8 | 24120 | 28760 | 35830 | 45400 | 57830 |
| 664 | Building Cleaning Workers | broad | 2803150.0 | 0.4 | NaN | NaN | NaN | 14.66 | 30490 | 0.3 | 9.63 | 11.3 | 13.54 | 16.81 | 21.48 | 20020 | 23500 | 28160 | 34970 | 44670 |
In broad sections of jobs, manual laborers, retail/cashier, and fast food workers are the most populous. Looking at their mean wage (the "H_MEAN" column) they are also payed very poorly. Not to mention physically demanding.
nat_df_major = nat_sort[nat_sort.O_GROUP == 'major']
nat_df_major
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 771 | Office and Administrative Support Occupations | major | 18548360.0 | 0.2 | NaN | NaN | NaN | 20.38 | 42390 | 0.1 | 12.1 | 14.69 | 18.62 | 24.4 | 31.26 | 25170 | 30560 | 38720 | 50760 | 65020 |
| 728 | Sales and Related Occupations | major | 13120320.0 | 0.2 | NaN | NaN | NaN | 22 | 45750 | 0.2 | 9.85 | 11.89 | 15.15 | 24.94 | 41.47 | 20490 | 24740 | 31500 | 51880 | 86270 |
| 1243 | Transportation and Material Moving Occupations | major | 12163360.0 | 0.3 | NaN | NaN | NaN | 19.08 | 39680 | 0.3 | 10.81 | 13.02 | 16.38 | 22.32 | 30.32 | 22490 | 27070 | 34080 | 46430 | 63050 |
| 625 | Food Preparation and Serving Related Occupations | major | 11262850.0 | 0.2 | NaN | NaN | NaN | 13.3 | 27650 | 0.2 | 8.78 | 9.8 | 12.26 | 14.91 | 19.05 | 18260 | 20380 | 25500 | 31020 | 39630 |
| 477 | Healthcare Practitioners and Technical Occupat... | major | 8579180.0 | 0.3 | NaN | NaN | NaN | 41.3 | 85900 | 0.2 | 17.11 | 24.16 | 33.59 | 47.95 | 71.5 | 35580 | 50250 | 69870 | 99740 | 148720 |
| 1076 | Production Occupations | major | 8519410.0 | 0.3 | NaN | NaN | NaN | 20.08 | 41760 | 0.2 | 11.85 | 14.12 | 18 | 24 | 31.41 | 24650 | 29360 | 37440 | 49910 | 65330 |
| 325 | Educational Instruction and Library Occupations | major | 8446910.0 | 0.4 | NaN | NaN | NaN | 28.75 | 59810 | 0.5 | 11.91 | 16.46 | 25.18 | 35.78 | 49.31 | 24770 | 34250 | 52380 | 74430 | 102570 |
| 67 | Business and Financial Operations Occupations | major | 8387490.0 | 0.2 | NaN | NaN | NaN | 38.79 | 80680 | 0.2 | 19.38 | 25.74 | 34.73 | 47.31 | 62.68 | 40310 | 53550 | 72250 | 98390 | 130380 |
| 1 | Management Occupations | major | 7947300.0 | 0.2 | NaN | NaN | NaN | 60.81 | 126480 | 0.2 | 24.84 | 35.7 | 52.77 | 76.71 | # | 51670 | 74250 | 109760 | 159550 | # |
| 557 | Healthcare Support Occupations | major | 6440880.0 | 0.4 | NaN | NaN | NaN | 15.5 | 32250 | 0.2 | 10.31 | 12.26 | 14.4 | 17.74 | 22.3 | 21450 | 25490 | 29960 | 36900 | 46380 |
| 904 | Construction and Extraction Occupations | major | 5937830.0 | 0.3 | NaN | NaN | NaN | 25.93 | 53940 | 0.2 | 14.12 | 17.57 | 23.37 | 31.46 | 42.24 | 29360 | 36540 | 48610 | 65440 | 87870 |
| 1001 | Installation, Maintenance, and Repair Occupations | major | 5486930.0 | 0.3 | NaN | NaN | NaN | 25.17 | 52360 | 0.1 | 13.55 | 17.36 | 23.44 | 31 | 39.78 | 28190 | 36100 | 48750 | 64490 | 82730 |
| 118 | Computer and Mathematical Occupations | major | 4587700.0 | 0.5 | NaN | NaN | NaN | 46.53 | 96770 | 0.5 | 22.31 | 31.12 | 43.92 | 59.44 | 75.54 | 46400 | 64730 | 91350 | 123640 | 157120 |
| 658 | Building and Grounds Cleaning and Maintenance ... | major | 4090370.0 | 0.3 | NaN | NaN | NaN | 15.75 | 32760 | 0.2 | 9.95 | 11.89 | 14.39 | 18.34 | 23.89 | 20700 | 24730 | 29940 | 38150 | 49690 |
| 584 | Protective Service Occupations | major | 3351180.0 | 0.4 | NaN | NaN | NaN | 25.11 | 52220 | 0.7 | 11.88 | 14.82 | 21.02 | 31.77 | 45.2 | 24710 | 30820 | 43710 | 66090 | 94020 |
| 676 | Personal Care and Service Occupations | major | 2696340.0 | 0.5 | NaN | NaN | NaN | 15.68 | 32610 | 0.2 | 9.21 | 10.99 | 13.52 | 17.99 | 25.08 | 19160 | 22870 | 28120 | 37410 | 52180 |
| 148 | Architecture and Engineering Occupations | major | 2515040.0 | 0.5 | NaN | NaN | NaN | 43.41 | 90300 | 0.3 | 22.21 | 29.87 | 39.98 | 53.83 | 69.27 | 46190 | 62120 | 83160 | 111960 | 144090 |
| 284 | Community and Social Service Occupations | major | 2231070.0 | 0.5 | NaN | NaN | NaN | 25.09 | 52180 | 0.4 | 13.91 | 17.38 | 22.85 | 30.42 | 39.64 | 28940 | 36140 | 47520 | 63260 | 82450 |
| 419 | Arts, Design, Entertainment, Sports, and Media... | major | 1857500.0 | 0.7 | NaN | NaN | NaN | 30.96 | 64400 | 0.7 | 12.25 | 16.82 | 25.55 | 37.93 | 54.63 | 25490 | 34980 | 53150 | 78900 | 113630 |
| 208 | Life, Physical, and Social Science Occupations | major | 1296060.0 | 0.7 | NaN | NaN | NaN | 38.15 | 79360 | 0.4 | 17.76 | 24.05 | 33.54 | 47.6 | 63.64 | 36930 | 50020 | 69760 | 99010 | 132370 |
| 310 | Legal Occupations | major | 1154740.0 | 0.5 | NaN | NaN | NaN | 54 | 112320 | 0.7 | 18.78 | 26.39 | 40.82 | 71.32 | # | 39070 | 54880 | 84910 | 148340 | # |
| 880 | Farming, Fishing, and Forestry Occupations | major | 478770.0 | 1.2 | NaN | NaN | NaN | 16.02 | 33310 | 0.4 | 11.58 | 13.02 | 14.27 | 17.28 | 23.53 | 24090 | 27070 | 29670 | 35950 | 48950 |
nat_df_minor = nat_sort[nat_sort.O_GROUP == 'minor']
nat_df_minor
| OCC_TITLE | O_GROUP | TOT_EMP | EMP_PRSE | JOBS_1000 | LOC_QUOTIENT | PCT_TOTAL | H_MEAN | A_MEAN | MEAN_PRSE | H_PCT10 | H_PCT25 | H_MEDIAN | H_PCT75 | H_PCT90 | A_PCT10 | A_PCT25 | A_MEDIAN | A_PCT75 | A_PCT90 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 733 | Retail Sales Workers | minor | 7628920.0 | 0.3 | NaN | NaN | NaN | 13.95 | 29010 | 0.2 | 9.31 | 10.71 | 12.63 | 15.13 | 19.41 | 19360 | 22270 | 26270 | 31470 | 40370 |
| 1302 | Material Moving Workers | minor | 6921900.0 | 0.4 | NaN | NaN | NaN | 16 | 33290 | 0.2 | 10.63 | 12.48 | 14.77 | 18.43 | 23.3 | 22110 | 25960 | 30720 | 38330 | 48460 |
| 640 | Food and Beverage Serving Workers | minor | 6135730.0 | 0.4 | NaN | NaN | NaN | 12.47 | 25930 | 0.3 | 8.57 | 9.44 | 11.57 | 13.93 | 17.36 | 17830 | 19630 | 24050 | 28980 | 36100 |
| 68 | Business Operations Specialists | minor | 5632020.0 | 0.3 | NaN | NaN | NaN | 37.66 | 78320 | 0.2 | 19 | 25.38 | 34.35 | 46.68 | 60.88 | 39510 | 52800 | 71450 | 97100 | 126620 |
| 478 | Healthcare Diagnosing or Treating Practitioners | minor | 5611620.0 | 0.4 | NaN | NaN | NaN | 50.58 | 105220 | 0.3 | 26.05 | 31.46 | 40.59 | 57.36 | 88.8 | 54190 | 65430 | 84430 | 119300 | 184710 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 713 | Baggage Porters, Bellhops, and Concierges | minor | 65240.0 | 2.8 | NaN | NaN | NaN | 15.67 | 32580 | 1.0 | 9.85 | 12.02 | 14.48 | 18.24 | 23.31 | 20500 | 24990 | 30110 | 37930 | 48490 |
| 1281 | Water Transportation Workers | minor | 63020.0 | 3.4 | NaN | NaN | NaN | 35.35 | 73530 | 4.0 | 14.17 | 19.73 | 28.48 | 41.65 | 61.74 | 29460 | 41030 | 59250 | 86620 | 128420 |
| 896 | Forest, Conservation, and Logging Workers | minor | 44120.0 | 2.5 | NaN | NaN | NaN | 20.3 | 42230 | 1.1 | 12.34 | 14.92 | 19.33 | 24.58 | 30.3 | 25670 | 31040 | 40200 | 51120 | 63020 |
| 717 | Tour and Travel Guides | minor | 38030.0 | 2.9 | NaN | NaN | NaN | 15.48 | 32200 | 0.8 | 9.82 | 11.64 | 14.16 | 18.14 | 22.91 | 20430 | 24200 | 29460 | 37730 | 47660 |
| 881 | Supervisors of Farming, Fishing, and Forestry ... | minor | 22640.0 | 2.6 | NaN | NaN | NaN | 26.16 | 54420 | 0.8 | 15.26 | 18.43 | 24.08 | 31.91 | 39.95 | 31730 | 38340 | 50080 | 66380 | 83090 |
92 rows × 20 columns
import plotly.express as px
fig = px.pie(nat_df_major, values='TOT_EMP', names='OCC_TITLE', title='Total Employment in Major Occupational Areas')
fig.show()
Visualizing the major groupings of jobs
Noting below that the wage information is actually not in number format, but in string ("object").
nat_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 789 entries, 741 to 82 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 OCC_TITLE 789 non-null object 1 O_GROUP 789 non-null object 2 TOT_EMP 788 non-null float64 3 EMP_PRSE 789 non-null object 4 JOBS_1000 0 non-null float64 5 LOC_QUOTIENT 0 non-null float64 6 PCT_TOTAL 0 non-null float64 7 H_MEAN 789 non-null object 8 A_MEAN 789 non-null object 9 MEAN_PRSE 789 non-null float64 10 H_PCT10 789 non-null object 11 H_PCT25 789 non-null object 12 H_MEDIAN 789 non-null object 13 H_PCT75 789 non-null object 14 H_PCT90 789 non-null object 15 A_PCT10 789 non-null object 16 A_PCT25 789 non-null object 17 A_MEDIAN 789 non-null object 18 A_PCT75 789 non-null object 19 A_PCT90 789 non-null object dtypes: float64(5), object(15) memory usage: 129.4+ KB
Here, I will convert the wage information to be floating-point objects, so I can work with their data numerically, as opposed to working with them as strings.
nat_df['H_MEAN'] = pd.to_numeric(nat_df['H_MEAN'], errors='coerce')
nat_df['A_MEAN'] = pd.to_numeric(nat_df['A_MEAN'], errors='coerce')
nat_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 789 entries, 741 to 82 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 OCC_TITLE 789 non-null object 1 O_GROUP 789 non-null object 2 TOT_EMP 788 non-null float64 3 EMP_PRSE 789 non-null object 4 JOBS_1000 0 non-null float64 5 LOC_QUOTIENT 0 non-null float64 6 PCT_TOTAL 0 non-null float64 7 H_MEAN 730 non-null float64 8 A_MEAN 785 non-null float64 9 MEAN_PRSE 789 non-null float64 10 H_PCT10 789 non-null object 11 H_PCT25 789 non-null object 12 H_MEDIAN 789 non-null object 13 H_PCT75 789 non-null object 14 H_PCT90 789 non-null object 15 A_PCT10 789 non-null object 16 A_PCT25 789 non-null object 17 A_MEDIAN 789 non-null object 18 A_PCT75 789 non-null object 19 A_PCT90 789 non-null object dtypes: float64(7), object(13) memory usage: 129.4+ KB
nat_df['H_MEDIAN'] = pd.to_numeric(nat_df['H_MEDIAN'], errors='coerce')
nat_df['A_MEDIAN'] = pd.to_numeric(nat_df['A_MEDIAN'], errors='coerce')
fig = px.bar(nat_df.head(50),
x='TOT_EMP', y='OCC_TITLE',
title='Total Employment in Top 50 Occupations (with mean hourly wage)',
orientation='h',
color='H_MEAN',
color_continuous_scale=px.colors.sequential.RdBu)
fig.update_xaxes(title='Total Employed')
fig.update_yaxes(title='Occupation Title')
fig.show()
Graphing the top 50 most populous occupations and noting their mean hourly wage. As I noted before, most jobs that are taken up by people are payed severly low. The color scale splits the hourly mean wage into an upper section (blue), a middle section (white), and a lower section (red) -> and it is clear the vast majority of the 50 most popular jobs are being paid lower than the average mean hourly wage, indicated by the predominance of red in the graph above.
fig = px.bar(nat_df.head(50),
x='TOT_EMP', y='OCC_TITLE',
title='Total Employment in Top 50 Occupations (with median wage)',
orientation='h',
color='H_MEDIAN',
color_continuous_scale=px.colors.sequential.RdBu)
fig.update_xaxes(title='Total Employed')
fig.update_yaxes(title='Occupation Title')
fig.show()
Comparing the data from before with the median wage, and we see a similar makeup as before. There still are the predominance of poverty wages for the most populous jobs.
fig = px.bar(nat_df.head(10),
x='TOT_EMP', y='OCC_TITLE',
title='Total Employment in Top 10 Occupations (with median hourly wage)',
orientation='h',
color='H_MEDIAN',
color_continuous_scale=px.colors.sequential.RdBu)
fig.update_xaxes(title='Total Employed')
fig.update_yaxes(title='Occupation Title')
fig.show()
Focusing on the top 10 most populous jobs, for a clearer and closer look at the above information.
nat_df_broad['H_MEDIAN'] = pd.to_numeric(nat_df_broad['H_MEDIAN'], errors='coerce')
nat_df_broad['A_MEDIAN'] = pd.to_numeric(nat_df_broad['A_MEDIAN'], errors='coerce')
nat_df_major['H_MEDIAN'] = pd.to_numeric(nat_df_major['H_MEDIAN'], errors='coerce')
nat_df_major['A_MEDIAN'] = pd.to_numeric(nat_df_major['A_MEDIAN'], errors='coerce')
fig = px.bar(nat_df_broad.head(10),
x='TOT_EMP', y='OCC_TITLE',
title='Total Employment in Top 10 Broad Occupations (with median hourly wage)',
orientation='h',
color='H_MEDIAN',
color_continuous_scale=px.colors.sequential.RdBu)
fig.update_xaxes(title='Total Employed')
fig.update_yaxes(title='Occupation Title')
fig.show()
Graphing the broad category of occupations and we continue to see similar results. One thing worth noting is the outlier that Registered Nurses play in this graph, being paid fairly high compared while also being a relatively popular occupation
Now, I will look at the highest paid occupations and compare them to their popularity as an occupation.
nat_df_wealthy = nat_df.sort_values(by=['H_MEDIAN'], ascending=False)
fig = px.bar(nat_df_wealthy.head(10),
x='H_MEDIAN', y='OCC_TITLE',
title='Top 10 Wealthiest Occupations',
color='TOT_EMP',
color_continuous_scale=px.colors.sequential.Darkmint,
orientation='h')
fig.update_xaxes(title='Median Hourly Wage')
fig.update_yaxes(title='Occupation Title')
fig.show()
So it seems (unsurprisingly), the highest paid occupations are medical professions, managers and executives.
fig = px.bar(nat_df_major,
x='TOT_EMP', y='OCC_TITLE',
title='Total Employment in Top 10 Broad Occupations (with median hourly wage)',
orientation='h',
color='H_MEDIAN',
color_continuous_scale=px.colors.sequential.RdBu)
fig.update_xaxes(title='Total Employed')
fig.update_yaxes(title='Occupation Title')
fig.show()
fig = px.box(nat_df,
title='Analysis of Wage Gap in the United States',
x="H_MEDIAN",
points="all",
orientation="h",
notched=True,
hover_data=["OCC_TITLE"])
fig.update_xaxes(title='Median Hourly Wage')
fig.show()
This is perhaps the most useful graph of them all, which very clearly and plainly reveals the wage gap in the United States. This box-and-whisker plot reveals a very skewed distribution towards poverty wages. The vast majority of the working-class make low wages, while only a select couple occupations make decent and high wages.
It is astounding how the median wage is only $23.28/hr which is less than 1/4th of what the max wage could be.
One other thing to note is that this database only accounts income from occupations, and do not include the wealth that comes from owning property, stocks, assets and private property. Taking all of those into account drives the gap between the rich and the poor even larger.
Given more time and more data, I would have liked to chart the difference in wealth between the rich and the poor over decades, to see what trends there are and if it is true that over the years the rich have been getting richer and the poor getting poorer. However, for an accurate analysis of that would need to account not only the wealth generated by occupational income, but also from assets and private property.